These slides, code and all materials can be found here: https://github.com/bwlewis/dnn_notes
DNNs have been amazingly successful in some applications…
See also Elad, “Deep, deep trouble” in SIAM News (https://sinews.siam.org/Details-Page/deep-deep-trouble-4)
DNNs have spawned ingenious computing advances…
And yet they are not without problems…
Szegedy, et al, “Intriguing properties of neural networks” (https://arxiv.org/pdf/1312.6199.pdf)
Ostrich image (c) Milos Andera, www.naturfoto.cz
Sharif, et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition” (https://www.cs.cmu.edu/~sbhagava/papers/face-rec-ccs16.pdf)
Examples of DNNs…
Supervised
Unsupervised
Reinforcement
“A new programming paradigm?”
from Francois Chollet and J.J. Allaire, “Deep Learning with R”
Now this doesn’t look that new to me!
Consider, for instance, good old ordinary least squares…
Rows of X are p-dimensional observations of variables along the columns (aka ‘features’).
For coefficients W (‘weights’) in Rp
and scalar coefficient b (‘bias’),
For coefficients W and scalar coefficients b,
So, maybe not so different after all?
The σ functions are nonlinear; typically non-negative thresholding or sigmoid functions. Minimization of objective functionals like
are non-linear and likely non-convex.
Very often solved with coordinate-wise gradient descent (aka stochastic gradient descent).
See the (ultra basic) example R code at http://illposed.net/deep_nnet_example.r
Neural networks in practice are elegantly specified…here is an example model definition in R using Keras.
model <- keras_model_sequential() %>%
layer_dense(units=256, activation='relu', input_shape=c(784)) %>%
layer_dropout(rate=0.4) %>%
layer_dense(units=128, activation='relu') %>%
layer_dropout(rate=0.3) %>%
layer_dense(units=10, activation='softmax')
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)This model can be trained on any architecture (CPU, GPU, TPU, cloud, laptop…). And, once trained, exported to any platform (server, desktop, phone, …).
But,
| Pattern | Use |
| Lots of observations (rows) → empirical distributions emerge for each variable. | The bootstrap, and in particular the big-data bootstrap (Jordan, et al.), batched methods, matching (Rubin, King, etc.), … Recall, for instance, batched SGD. |
| Lots of variables (columns) → dependencies emerge between variables. | Dimensionality reduction, subspace projection, … For instance, dropout. |
Directly project nonlinear operator into a subspace; solve projected problem? (Potentially computationally more efficient, plus regularization.)
Error in variables models (non-fixed)? (Total-least squares like optimization objectives.)